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Abstract 

In the quantum state tomography problem, one wishes to estimate an unknown d-dimensional 
mixed quantum state p, given few copies. We show that 0(d/e) copies suffice to obtain an 
estimate p that satisfies ||p — p\\ 2 F < e (with high probability). An immediate consequence is that 
0(rank(p)-d/e 2 ) < 0(d 2 /e 2 ) copies suffice to obtain an e-accurate estimate in the standard trace 
distance. This improves on the best known prior result of 0(d 3 /e 2 ) copies for full tomography, 
and even on the best known prior result of 0(d 2 log(d/e)/e 2 ) copies for spectrum estimation. 
Our result is the first to show that nontrivial tomography can be obtained using a number of 
copies that is just linear in the dimension. 

Next, we generalize these results to show that one can perform efficient principal component 
analysis on p. Our main result is that 0(kd/e 2 ) copies suffice to output a rank-/c approximation 
p whose trace distance error is at most e more than that of the best rank-fc approximator to p. 
This subsumes our above trace distance tomography result and generalizes it to the case when p 
is not guaranteed to be of low rank. A key part of the proof is the analogous generalization 
of our spectrum-learning results: we show that the largest k eigenvalues of p can be estimated 
to trace-distance error e using 0(k 2 /e 2 ) copies. In turn, this result relies on a new coupling 
theorem concerning the Robinson-Schensted-Knuth algorithm that should be of independent 
combinatorial interest. 


1 Introduction 

Quantum state tomography refers to the task of estimating an unknown d-dimensional quantum 
mixed quantum state, p, given the ability to prepare and measure n copies, p® n . It is of enormous 
practical importance for experimental detection of entanglement and the verification of quantum 
technologies. For an anthology of recent advances in the area, the reader may consult [BCG13] . As 
stated in its introduction, 

The bottleneck limiting further progress in estimating the states of [quantum] systems 
has shifted from physical controllability to the problem of handling. . . the exponential 
scaling of the number of parameters describing quantum many-body states. 

Indeed, a system consisting of b qubits has dimension d = 2 b and is described by a density matrix 
with d 2 = A b complex parameters. For practical experiments with, say, b < 10, it is imperative 
to use tomographic methods in which n grows as slowly as possible with d. For 20 years or so, 
the best known method used n = 0(d 4 ) copies to estimate p to constant error; just recently this 
was improved [KRT14] to n = 0(d 3 ). Despite the practical importance and mathematical elegance 
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of the quantum tomography problem, the optimal dependence of n on d remained “shockingly 
unknown” [Harl5] as of early 2015. 

In this work we analyze known measurements arising from the representation theory of the 
symmetric and general linear groups 6(n) and GL^ = GL^(C) — specifically, the “Empirical 
Young Diagram (EYD)” measurement considered by [ARS88, KW01], followed by Keyl’s [KW01, 
Key06] state estimation measurement based on projection to highest weight vectors. The former 
produces a random height-d partition A h n according to the Schur-Weyl distribution SW"(a), 
which depends only on the spectrum aq > a 2 > • • • > of p\ the latter produces a random 
d-dimensional unitary U according to what may be termed the Keyl distribution K \(p). Writing 
A for (Ai/n,..., Ad/n), we show the following results: 

Theorem 1.1. E IIA — all, < —. 

A~SW“(a) " n 

A 7 Q 

Theorem 1.2. E ||C/diag(A)C/^ — p\\% < -. 

A~SW"(a) n 

U~K x {p) 

In particular, up to a small constant factor, full tomography is no more expensive than spectrum 
estimation. These theorems have the following straightforward consequences: 

Corollary 1.3. The spectrum of an unknown rank-r mixed state p € (& dxd can be estimated to 
error e in l^-distance using n = 0{r/e 2 ) copies, or to error e in total variation distance using 
n = 0(r 2 /e 2 ) copies. 

Corollary 1.4. An unknown rank-r mixed state p £ <D dxrf may be estimated to error e in Frobenius 
distance using n = 0(d/e 2 ) copies, or to error e in trace distance using n = 0(rd/e 2 ) copies. 

(These bounds are with high probability; confidence 1 — 5 may be obtained by increasing the copies 
by a factor of log(l/d).) 

The previous best result for spectrum estimation [HM02, CM06] used 0(r 2 log(r/e)/e) copies 
for an e-accurate estimation in KL-divergence, and hence 0(r 2 log(r/e)/e 2 ) copies for an e-accurate 
estimation in total variation distance. The previous best result for tomography is the very re¬ 
cent [KRT14, Theorem 2], which uses n = 0(rd/e 2 ) for an e-accurate estimation in Frobenius 
distance, and hence n = 0{r 2 d/e 2 ) for trace distance. 

As for lower bounds, it follows immediately from [FGLE12, Lemma 5] and Holevo’s bound that 
D(rd) copies are necessary for tomography with trace-distance error eo, where eo is a universal 
constant. (Here and throughout H(-) hides a factor of logd.) Also, Holevo’s bound combined 
with the existence of almost-orthogonal pure states shows that 12(d) copies are necessary for 
tomography with Frobenius error £q, even in the rank-1 case. Thus our tomography bounds are 
optimal up to at most an O(logd) factor when e is a constant. (Conversely, for constant d, it is easy 
to show that 12(l/e 2 ) copies are necessary even just for spectrum estimation.) Finally, we remark 
that H(d 2 ) is a lower bound for tomography with Frobenius error e = 0(1 /y/d)\ this also matches 
our 0(d/e 2 ) upper bound. This last lower bound follows from Holevo and the existence [Sza82] of 
2 ^( d2 ) normalized rank-d/2 projectors with pairwise Frobenius distance at least H(1 /yfd). 

1.1 Principal component analysis 

Our next results concern principal component analysis (PCA), in which the goal is to find the best 
rank-fc approximator to a mixed state p € (D dxd , given 1 < k < d. Our algorithm is identical 
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to the Keyl measurement from above, except rather than outputting I7diag(A)£/t, it outputs 
t/diag^(A)E/l instead, where diag^(A) means diag(A 1; ..., X k , 0,... , 0). Writing aq > 02 > 

■ ■ ■ > oid for the spectrum of p, our main result is: 

Theorem 1.5. E ||C/diag^(A)[/^ — pIL < a k +\ + ■ ■ ■ + a d + 6 
A~SW"(a) 
t/~K x (p) 

As the best rank-fc approximator to p has trace-distance error a k+ \ + ... + ad, we may immediately 
conclude: 

Corollary 1.6. Using n = 0(kd/e 2 ) copies of an unknown mixed state p £ <C dxd , one may find 
a rank-k mixed state p such that the trace distance of p from p is at most e more than that of the 
optimal rank-k approximator. 



Since ajt+i = ... = au = 0 when p has rank k, Corollary 1.6 strictly generalizes the trace- 
distance tomography result from Corollary 1.4. We also remark that one could consider performing 
Frobenius-norm PCA on p , but it turns out that this is unlikely to give any improvement in copy 
complexity over full tomography; see Section 6 for details. 

As a key component of our PCA result, we investigate the problem of estimating just the 
largest k eigenvalues, aq ,... ,«&, of p. The goal here is to use a number of copies depending only 
on k and not on d or rank(p). We show that the standard EYD algorithm achieves this: 


Theorem 1.7. 


E 

-SW 7 


(«) 


4v(A,«)< 


1.92 k + .5 


where d^y((3, a) denotes \ Yli =1 I Pi ~ a % 


From this we immediately get the following strict generalization of (the total variation distance 
result in) Corollary 1.3: 

Corollary 1.8. The largest k eigenvalues of an unknown mixed state p £ can be estimated 

to error e in in total variation distance using n = 0(k 2 /e 2 ) copies. 


The fact that this result has no dependence on the ambient dimension d or the rank of p may 
make it particularly interesting in practice. 


1.2 A coupling result concerning the RSK algorithm 

For our proof of Theorem 1.7, we will need to establish a new combinatorial result concerning 
the Robinson-Schensted-Knuth (RSK) algorithm applied to random words. We assume here the 
reader is familiar with the RSK correspondence; see Section 2 for a few basics and, e.g., [Ful97] for 
a comprehensive treatment. 

Notation 1.9. Let a be a probability distribution on [d] = {1,2,... ,d}, and let w € [d] n be a 
random word formed by drawing each letter Wi independently according to a. Let A be the shape 
of the Young tableaus obtained by applying the RSK correspondence to w. We write SW n (a) for 
the resulting probability distribution on A. 

Notation 1.10. For x, y £ H d , we say x majorizes y, denoted x y y, if Yli =1 x [i] A X4i V[i] f° r 
k £ [d] = {1,2,..., d}, with equality for k = d. Here the notation x\^ means the ith largest value 
among x \,..., Xd■ We also use the traditional notation A > p instead when A and p are partitions 
of n (Young diagrams). 

In Section 7 we prove the following theorem. The proof is entirely combinatorial, and can be 
read independently of the quantum content in the rest of the paper. 

Theorem 1.11. Let a, /3 be probability distributions on [d] with f3 y a. Then for any n € N there 
is a coupling (A, p) of SW"(q) and SW n (/3) such that p>X always. 
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1.3 Independent and simultaneous work. 

Independently and simultaneously of our work, Haah et al. [HHJ + 15] have given a slightly different 
measurement that also achieves Corollary 1.4, up to a log factor. More precisely, their measurement 
achieves error e in infidelity with n = 0(rd/e ) • log(d/e) copies, or error e in trace distance with 
n = 0{rd/e 2 ) • log(d/e) copies. They also give a lower bound of n > Q(rd/e 2 )/log(d/re) for 
quantum tomography with trace distance error e. After seeing a draft of their work, we observed 
that their measurement can also be shown to achieve expected squared-Frobenius error 4d ~ 3 , using 
the techniques in this paper; the brief details appear at [Wril5]. 

1.4 Acknowledgments. 

We thank Jeongwan Haah and Aram Harrow (and by transitivity, Vlad Voroninski) for bring¬ 
ing [KRT14] to our attention. We also thank Aram Harrow for pointing us to [Key06]. The 
second-named author would also like to thank Akshay Krishnamurthy and Ashley Montanaro for 
helpful discussions. 

2 Preliminaries 

We write A b n to denote that A is a partition of n; i.e., A is a finite sequence of integers Ai > A 2 > 
A 3 > ■ • • summing to n. We also say that the size of A is |A| = n. The length (or height ) of A, 
denoted £(X), is the largest d such that A^ 0. We identify partitions that only differ by trailing 
zeroes. A Young diagram of shape A is a left-justified set of boxes arranged in rows, with A* boxes 
in the ?'th row from the top. We write fi /*■ A to denote that A can be formed from /r by the addition 
of a single box to some row. A standard Young tableau T of shape A is a filling of the boxes of A 
with [n] such that the rows and columns are strictly increasing. We write A = sh(T). Note that T 
can also be identified with a chain 0 = A^ /* A^ 1 ) \( n ) = where A^ is the shape of 

the Young tableau formed from T by entries l..t. A semistandard Young tableau of shape A and 
alphabet A is a filling of the boxes with letters from A such that rows are increasing and columns 
are strictly increasing. Here an alphabet means a totally ordered set of “letters”, usually [d]. 

The quantum measurements we analyze involve the Schur-Weyl duality theorem. The symmet¬ 
ric group &(ri) acts on (<C d )® n by permuting factors, and the general linear group GL^ acts on it 
diagonally; furthermore, these actions commute. Schur-Weyl duality states that as an &(n) x GL d 
representation, we have the following unitary equivalence: 

(<C d ) m = 0 s Pa ®v£ 

Abn 

i{\)<d 

Here we are using the following notation: The Specht modules Sp A are the irreducible representations 
spaces of &(n), indexed by partitions Ah n. We will use the abbreviation dim(A) for dim(Sp A ); 
recall this equals the number of standard Young tableaus of shape A. The Schur (Weyl) modules 
V A are the irreducible polynomial representation spaces of GL^, indexed by partitions (highest 
weights) A of length at most d. (For more background see, e.g., [Har05].) We will write 7 t a : GL^ —>• 
End(V A ) for the (unitary) representation itself; the domain of 7 r A naturally extends to all of (& dxd 
by continuity. We also write |T A ) for the highest weight vector in V A ; it is characterized by the 
property that tt a (A) |T a ) = {Uk=i A kk> |T a ) if A = ( Aij ) is upper-triangular. 

The character of V A is the Schur polynomial s A (xi,... ,x f 2 ), a symmetric, degree-|A|, homoge¬ 
neous polynomial in x = (x\,... ,Xd) defined by s A (x) = a\+s(x)/as(x), where 5 = (d — 1 ,d — 
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2,..., 1,0) and a^(x) = det(x^ J ). Alternatively, it may be defined as X/r Ilf=i x f lT ■> where T 
ranges over all semistandard tableau of shape A and alphabet [d], and denotes the number of 
occurrences of i in T. We have dim(V A ) = s^(l,...,l), the number of semistandard Young tableaus 
in the sum. We’ll write TaZ) for the normalized Schur polynomial saZi, • • •, Xd)/s\{l, ..., 1). Fi¬ 
nally, we recall the following two formulas, the first following from Stanley’s hook-content formula 
and the Frame-Robinson-Thrall hook-length formula, the second being the Weyl dimension for¬ 
mula: 


«a(1,■•■,!) 


dim(A) 


n (d+j-b) 

(m)s a 



l<i<7<cZ 


*j) + U-i) 

3 - i 


( 1 ) 


Given a positive semidefinite matrix p G flZ, we typically write a G flZ for its sorted spectrum ; 
i.e., its eigenvalues a\ > ct 2 > ■ ■ ■ > a d > 0. When p has trace 1 it is called a density matrix (or 
mixed state), and in this case a defines a (sorted) probability distribution on [d]. 

We will several times use the following elementary majorization inequality: 


If c,x,y G are sorted (decreasing) and x y y then c ■ x > c ■ y. 


(2) 


Recall [Ful97] that the Robinson-Schensted-Knuth correspondence is a certain bijection be¬ 
tween strings w G A n and pairs ( P,Q ), where P is a semistandard insertion tableau filled by the 
multiset of letters in w, and Q is a standard recording tableau, satisfying sh(Q) = sh(P). We 
write RSK(rc) = (P, Q ) and write shRSK(u>) for the common shape of P and Q, a partition of n 
of length at most |Al|. One way to characterize A = shRSK(re) is by Greene’s Theorem [Gre74]: 
Ai + • • • + Xk is the length of the longest disjoint union of k increasing subsequences in w. In par¬ 
ticular, Ai = LIS(w;), the length of the longest increasing (i.e., nondecreasing) subsequence in w. 
We remind the reader here of the distinction between a subsequence of a string, in which the letters 
need not be consecutive, and a substring, in which they are. We use the notation w[i..j] for the 
substring (wi,Wi+ 1 ,... ,Wj) G 

Let a = (ai,..., ad) denote a probability distribution on alphabet [d], let a® n denote the 
associated product probability distribution on [d] n , and write a® 00 for the product probability 
distribution on infinite sequences. We define the associated Schur-Weyl growth process to be the 
(random) sequence 

0 = A (0) A (1) A (2) Z 1 A (3) Z • • • (3) 

where w ~ a®°° and A^ = shRSK(ir[l ..t]). Note that the marginal distribution on A^ is what 
we call SW"(a). The Schur-Weyl growth process was studied in, e.g., [O’C03], wherein it was 
noted that the RSK correspondence implies 

Pr-fA^ = Vf < n] = s A („)(a) (4) 

for any chain 0 = A^ 0 ) Z Z X^ n \ (Together with the fact that s\{a) is homogeneous of 
degree |A|, this gives yet another alternate definition of the Schur polynomials.) One consequence 
of this is that for any i G [d] we have 

Pr[A( n+1 ) = A + ei \ \ {n) = A] = £^±£iM. (5) 

s\[a) 

(This formula is correct even when A + is not a valid partition of n + 1: in this case s\ +ei = 0 for¬ 
mally under the determinantal definition.) The above equation is also a probabilistic interpretation 
of the following special case of Pieri’s rule: 

d 

(xi H-f x d )s\(xi,... ,x d ) = y^ j sx + e i (xi,...,x d ). (6) 

i =1 
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We will need the following consequence of (5): 

Proposition 2.1. Let Ah n and let a £ be a sorted probability distribution. Then 

(£i±aM,...,£(7) 

v S\(a) S\(a) ) 

Proof. Let f3 be the reversal of a (i.e. /3* = a^-i+i) and let (X^)t>o be a Schur-Weyl growth 
process corresponding to j3. By (5) and the fact that the Schur polynomials are symmetric, we 
conclude that the vector on the left of (7) is (pi,... ,pd), where pi = Pr[A^ n+1 ' = A + e^ | A^ = A]. 
Now p\ + • • • +pk is the probability, conditioned on X (n ^ = A, that the (n + l)th box in the process 
enters into one of the first k rows. But this is indeed at least an + • • • + Ofc = /?d + • • ■ + (3d-k- 1 - 1 , 
because the latter represents the probability that the (n + l)th letter is d — k + 1 or higher, and 
such a letter will always be inserted within the first k rows under RSK. □ 

A further consequence of (4) (perhaps first noted in [ITW01]) is that for A h n, 


. Pr [A = A] = dim(A)s A (a). ( 8 ) 

A~SW n (a) 

At the same time, as noted in [ARS 88 ] (see also [Aud06, Equation (36)]) it follows from Schur-Weyl 
duality that if p £ (D rfxrf is a density matrix with spectrum a then 

tr(n A p 0n ) = dim(A)s / \(a), 

where n A denotes the isotypic projection onto Sp A <S> Vf. Thus we have the identity 

tr(II A/ o® n )= Pr [A = A]. (9) 

A~SW"(a) 


3 Spectrum estimation 


Several groups of researchers suggested the following method for estimating the sorted spectrum a of 
a quantum mixed state p £ (C dxd : measure p® n according to the isotypic projectors {n A } A (- n ; and, 
on obtaining A, output the estimate a = X = (X\/n,..., Xd/n). The measurement is sometimes 
called “weak Schur sampling” [CHW07] and we refer to the overall procedure as the “Empirical 
Young Diagram (EYD)” algorithm. We remark that the algorithm’s behavior depends only on 
the rank r of p; it is indifferent to the ambient dimension d. So while we will analyze the EYD 
algorithm in terms of d, we will present the results in terms of r. 

In [HM02, CM06] it is shown that n = 0(r 2 log(r/e)/e 2 ) suffices for EYD to obtain c?kl(A, a) < 
2e 2 and hence dTv(A , a ) ^ e with high probability. However we give a different analysis. By equa¬ 
tion (9), the expected 7|-error of the EYD algorithm is precisely E A ^g W ™/ Q ) [|A — a|||. Theorem 1.1, 
which we prove in this section, bounds this quantity by -. Thus 


EdTv(Ai a ) = \ E ||A — a||i < ^aA’E ||A 


a. 2 


<WF\/E||A- 


a 


2 < 
2 — 



which is bounded by e/4, say, if n = 4 r 2 /e 2 . Thus in this case Pr[cLrv(A, a) > e] < 1/4. By a 
standard amplification (repeating the EYD algorithm 0(log l/d) times and outputting the estimate 
which is within 2e total variation distance of the most other estimates), we obtain Corollary 1.3. 


We give two lemmas, and then the proof of Theorem 1.1. 


6 






Lemma 3.1. Let a € IR“ be a probability distribution. Then 

d d 


E y A 2 < V(nctj) 2 + dn. 

W nr„\ 2—J 1 — 


A~SW"(q) z 

2—1 


2—1 


Proof. Define the polynomial function 


<(A) 


P* 2 ( X ) = (( Ai ~ i + \) 2 - H + l) 2 ) 


2=1 


By Proposition 2.34 and equation (12) of [OW15], E^g W ™( a ) [p^A)] = n ( n — 1) ' J2i =1 a l. ■ Hence, 


e^a 2 = e 


2=1 


P2W +y^ - i)A* 


2=1 


< Epj(^) + - ^("Ao < » 2 • E a 2 + dn. 


2=1 


2=1 


Here the hrst inequality used inequality (2) and A >~ (n/d,... ,n/d). 


□ 


Lemma 3.2. Let A ~ SW"(a), where a € IR d is a sorted probability distribution. Then (E Ai,..., E A^) V 
(aqra,.. .,ctdn). 

Proof. Let w ~ a® n , so A is distributed as shRSK(-io). The proof is completed by linearity of 
expectation applied to the fact that (Ai,..., Xd) (#iw, ..., ifd'w) always, where denotes 

the number of times letter k appears in id. In turn this fact holds by Greene’s Theorem: we can 
form k disjoint increasing subsequences in w by taking all its l’s, all its 2’s, ..., all its k 1 s. □ 

Proof of Theorem 1.1. We have 

d d d 

n 2 • E ||A — a ||2 = Ey (Aj - a*n) 2 = Ey (A 2 + (a^n) 2 ) - 2y (a,n) • EA.j 
A~SW"(a) ^ ^ ^ 


2=1 


2=1 


2=1 


d 


< dn + 2 ^^(ajjr) 2 — 2 • E A.; < dn + 2 ^^(c^n) 2 — 2 • (a*n) = dn, 


2=1 


2=1 


2=1 


2=1 


where the hrst inequality used Lemma 3.1 and the second used Lemma 3.2 and inequality (2) (recall 
that the coefficients cun are decreasing). Dividing by n 2 completes the proof. □ 


4 Quantum state tomography 

In this section we analyze the tomography algorithm proposed by Keyl [Key06] based on projection 
to the highest weight vector. Keyl’s method, when applied to density matrix p e (C dxd with sorted 
spectrum a , begins by performing weak Schur sampling on p® n . Supposing the partition thereby 
obtained from SW n (a) is A h n, the state collapses to . SA ( 0: ) 7 F a(p) € V x . The main step of Keyl’s 
algorithm is now to perform a normalized POVM within Vf whose outcomes are unitary matrices 
in U (d). Specifically, his measurement maps a (Borel) subset F C U(d) to 

M(F) := J 7 t x (U) \T X ) (T x \ tt x (U)' • dim(V^) dU, 

where dU denotes Haar measure on U(d). (To see that this is indeed a POVM — i.e., that M := 
M(XJ(d)) = I — first note that the translation invariance of Haar measure implies tt x (V)Mtt x (V)^ = 
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M for any V E U (d). Thinking of ir\ as an irreducible representation of the unitary group, Schur’s 
lemma implies M must be a scalar matrix. Taking traces shows M is the identity.) 

We write K A (p) for the probability distribution on U(d) associated to this POVM; its density 
with respect to the Haar measure is therefore 

tr X (U) | T x ) (T x \ 7r x (Uy • dim(V^)) = ^(a)” 1 • (T x \n x (U^pU) \T X ) . (10) 

Supposing the outcome of the measurement is U, Keyl’s final estimate for p is p = [/diag(A)£7L 
Thus the expected Frobenius-squared error of Keyl’s tomography algorithm is precisely 

, E ||C/diag(A)t/t - pm. 

A~SW (a) 

U~K x (p) 

Theorem 1.2, which we prove in this section, bounds the above quantity by 4d ~ 3 . Let us assume 
now that rank(p) < r. Then £(A) < r always and hence the estimate t/diag(A)I/l will also have 
rank at most r. Thus by Cauchy-Schwarz applied to the singular values of £/diag(A)£/l — p , 

Ed tr ([/diag(A) U\p) = ± E ||E7diag(A)C7 t - /0 || x < |v^E ||A-a|| F < yJrJ^yjE ||A - a\\ 2 F < 

and Corollary 1.4 follows just as Corollary 1.3 did. 

The remainder of this section is devoted to the proof of Theorem 1.2. 


4.1 Integration formulas 

Notation 4.1. Let Z E <C dxd and let A be a partition of length at most d. The generalized power 
function A x is defined by 

d 

A x (Z) = n P m k{ Z)^ 

k =1 

where pm k {Z) denotes the fcth principal minor of Z (and A^ + i = 0). 

As noted by Keyl [Key06, equation (141)], when Z is positive semidefinite we have {T x \ tt x (Z) |T a ) = 
A X (Z); this follows by writing Z = LL 1 for L = ( L^j ) lower triangular with nonnegative diagonal 
and using the fact that A X (Z) = A A (Ll ) 2 = Ofc=i ^kk ■ Putting this into (10) we have an alternate 
definition for the distribution K x (p): 


E 

£/~K A (p) 


f(U) 


4> A (a) 1 


E 

£/~U(d) 


f{U) ■ A x (U*pU) 


( 11 ) 


where U ~ U(d) denotes that U has the Haar measure. For example, taking / = 1 yields the 
identity 

E1 A A (C/tp[/) = 4> A (a); (12) 

U (a) 

this expresses the fact that the spherical polynomial of weight A for GLd/\J(d) is precisely the 
normalized Schur polynomial (see, e.g., [Farl5]). For a further example, taking f(U) = A^(t/1 pU) 
and using the fact that A a • A M = A a+m , we obtain 


E 

U~K x (p) 


A p(tfpU) 


4> A (a) 


in particular, E (U^pU)\ i = — A+ei 

i/~k a (p) ’ ^a(«) 


(13) 


For our proof of Theorem 1.2, we will need to develop and analyze a more general formula for the 
expected diagonal entry E(£/l pU)k,k- We begin with some lemmas. 











Definition 4.2. For A a partition and m a positive integer we define the following partition of 
height (at most) m: 

AH = (A 1 -A m+1 ,...,A m -A m+1 ). 

We also define the following “complementary” partition A[ m ] satisfying A = Al m l + A[ m j: 


(^[m] )i 


■^m +1 i — 

A i i > m + 1. 


Lemma 4.3. Let p € (& dxd be a density matrix with spectrum a and let A h n have height at 
most d. Let m € [d] and let f m be an m-variate symmetric polynomial. Then 


E f m ((3) = $ A (a) 


-l 


t/~K A (p) 


E 

E/~U(d) 


/ m (/3)-$ AM (/3)-A A[ml ([/tpC7) 


where we write (3 = spec m (U^ pU) for the spectrum of the top-left m x m submatrix of pU. 

Proof. Let V ~ U(m) and write V = V © I, where / is the (d — m)-dimensional identity matrix. 
By translation-invariance of Haar measure we have UV ~ U (d), and hence from (11), 


E 

U~K x {p) 


fm{(3) 


^A(a)" 1 E 

f7~U(d),V~U(m) 


fm(spec m (V ] U^pUV)) • A x (V ] U^pUV) 


(14) 


Note that conjugating a matrix by V does not change the spectrum of its upper-left k x k block 
for any k > m. Thus spec m (V^ pUV) is identical to j3, and pm fc (' pUV) = pm k (U^pU) 
for all k > m. Thus using A a = A a • A A [ m ] we have 


(14) = 4> A (a 


-i 


E 


f m ((3) • A A[ , {u'pU) • E x [a aW (v'rfpUV) 


V~U(m) L 


But the inner expectation equals 4 > A [ m ](/3) by (12), completing the proof. 


□ 


'S A M +e . ( 1 /m) 4> A+e . (a) 


Lemma 4.4. In the setting of Lemma f.3, 

tr~K A (p)iIx pU ^} £ s A[m] (l/m) <F A (a) 
where 1/m abbreviates 1/m ,... ,1/m (repeated m. times). 


(15) 


Remark 4.5. The right-hand side of (15) is also a weighted average — of the quantities <h A+ei (a)/4> A (a) 
- by virtue of (5). The lemma also generalizes (13), as s A [i] +e (l)/s A [ij (1) is simply 1. 

Proof. On the left-hand side of (15) we have — times the expected trace of the upper-left m x m 
submatrix of pU. So by applying Lemma 4.3 with / m (/3) = + • • • + /3 m ), it is equal to 


< h A («) _1 ■ E 


= 4> A (a) _1 • E 

£/~U(<2) 


m (P, + ■ • • + /3 m ) • !) ' a »m W'pU) 

m (!,••• , 1 ) 11 


(by Pieri ( 6 )) 


= ^a(«) 1 - y 


s A[ m ]+e;(E 


E 


2=1 

m 


rri 


(!,•■■ >!) u~u{d) L 


$ x[m]+e (/3)-A x , m] (UlpU) 


= <h A (a) _1 • y • 4> A+e .(a), 

^m-s A[ra] (!,...,!) A+ea ^ 
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where in the last step we used Lemma 4.3 again, with f m = l and A + in place of A. But this is 
equal to the right-hand side of (15), using the homogeneity of Schur polynomials. □ 

Lemma 4.6. Assume the setting of Lemma f.3. Then rji := Ejy^ KA ( p )(£/t pU) m ^ m is a convex 
combination of the quantities Ri := 4>,\_|_ e . (a)/<3?x(a:), 1 < i < m. 1 

Proof. This is clear for m = 1. For m > 1, Remark 4.5 implies 


avg{? 7 j = piRi H-b PmRnu avg {rg} = qiRi H-b q m R m , 

2=1 2=1 

where piH-bp m = qi~\ -b q m = 1 and q m = 0. Thus ip = U-R*, where r; = (mpi - (m - 1)%), 

and evidently J2i =l r * = m — ( m — 1) = 1- It remains to verify that each r* > 0 . This is obvious 
for i = m; for i < m, we must check that 

g A[ m ]+ ej (1, •••;!) > ^Al m ~ 1 ]+ej (I’ •••>!) 

s A[ m ] (I) •••>!) s Al m_1 l (I) ■■■)!) 

Using the Weyl dimension formula from (1), one may explicitly compute that the ratio of the left 
side of (16) to the right side is precisely 1 + ^ A ._ A > 1- This completes the proof. □ 

We will in fact only need the following corollary: 

Corollary 4.7. Let p £ (& dxd be a density matrix with spectrum a and let A h n have height at 
most d. Then E u~k x ( p )(U ] pU) m , m > 4>A+e m (a)/ < I , A(a) for every m £ [d\, 

Proof. This is immediate from Lemma 4.6 and the fact that &\+ ei (a) > <&\+ ern (a) whenever i < m 
(assuming A + ej is a valid partition). This latter fact was recently proved by Sra [Sral5], verifying 
a conjecture of Cuttler et al. [CGS11]. □ 

4.2 Proof of Theorem 1.2 

Throughout the proof we assume A ~ SW" (a) and U ~ K^(p). We have 

n 2 • ||[/diag(A)f/t - pf F = n 2 • E^ ||diag(A) - U^pU\\ 2 F 

d d d d d 

= EVA 2 + Y'(cqn) 2 -2n E V A, ; ([/Vf/) M < dn + 2 ^(apr ) 2 - 2n E V A ?; £(1/^(7)^, 

A \,U A U 

2=1 2=1 2=1 2=1 2=1 

(17) 


using Lemma 3.1. Then by Corollary 4.7 
d d 

E A * E 

A U 

s \+ei ( a ) 


E \iE(U^pU)ii >eVa,^±^ = EVAi SA+f| { a j — 

^ rr K P A 4^ <&*(«) x 4^ s x (a) s x+ei ( !,...,!) 


2=1 
d 

-fA> 


2=1 


«a(o) 


2 - 


i=l ^ a ) 

s \+ei ( 1 ) ••■)!) 


sa(1, ■••,!) 


i=1 
d 


2EV'A-^±^^--EV'A s A+ ei («) SA+ei(l) ■■■>!) 

“ ' ^ 1 s\{a) A ^ 


2=1 


2=1 


sx(a) s A (l,...,l)’ 
(18) 


1 To be careful, we may exclude all those i for which A + a is an invalid partition and thus Ri = 0. 
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where we used r > 2 — X for r > 0. We lower-bound the first term in (18) by first using the 
inequality (2) and Proposition 2.1, and then using inequality (2) and Lemma 3.2 (as in the proof 
of Theorem 1.1): 

2 fE A ‘ £ ^wMs 2 fE A '“^ 2n E“. 2 ' ( 1£ >) 

2 = 1 ^ ' 2=1 2=1 

As for the second term in (18), we use ( 8 ) and the first formula in ( 1 ) to compute 


s A+ei( a ) SA+ei(l> • • • > 1) \ r \ \ 

5 E A ‘ = E E *m(A) SA (a) ■ A. 


s\ +ei (a) dim (A + e;)(d + A, - i + 1) 


2=1 Ah n 
d 


EE dim (A + ei)s\ +ei (a) 


sx(oa) dim(A)(ra + 1) 

Xi(d — i + A* + 1 ) 


2=1 Ahn 
d 

< y e 

J \' .own 


n + 1 


(X — l)(d — i + A') 


(by ( 8 ) again) 


-( A'~SW" +1 (a) n+1 

2=1 v ' 

5s —~t ( E ^(A') 2 + E 1)A' 

n + 1 \ A'~SW" +1 (a) “ A'~SW“ +1 (a) 7 f 


H*) 1 

v ' i=l 


< 


—|-j- ({n + l)n a i + + * ~ 2 )(( n + 1 )/ d )] 

\ 2=1 2=1 / 


n + 

hh 9 3 3 

= n Y a i + 2 d ~ 2 
2=1 


( 20 ) 


where the last inequality is deduced exactly as in the proof of Lemma 3.1. Finally, combining 
(17)-(20) we get 

n 2 • E ||[/diag(A)[h — p\\ 2 F < Adn — 3n. 

A,17 

Dividing both sides by n 2 completes the proof. □ 


5 Truncated spectrum estimation 


In this section we prove Theorem 1.7, from which Corollary 1.8 follows in the same way as Corol¬ 
lary 1.3. The key lemma involved is the following: 

Lemma 5.1. Let a € ih be a sorted probability distribution. Then for any k £ [d], 


E 

-sw r 


(a) 


k 

E A - 

2=1 


< 


’Y] a i n + 2 \f 2 k\fn. 


2=1 


We remark that it is easy to lower -bound this expectation by Yli =1 v ^ a Lemma 3.2. We 
now show how to deduce Theorem 1.7 from Lemma 5.1. Then in Section 5.1 we prove the lemma. 

Proof of Theorem 1.7. Let w ~ a® n , let RSK(-jn) = ( P,Q ), and let A = sh(P), so A ~ SW™(a). 
Write w 1 for the string formed from w by deleting all letters bigger than k. Then it is a basic 
property of the RSK algorithm that RSK(u/) produces the insertion tableau P' formed from P 
by deleting all boxes with labels bigger than k. Thus X' = sh(T >/ ) = shRSK(u/). Denoting 
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ct![fc] = ai + ■ ■ ■ + otk, we have A' ~ SW m (a'), where m ~ Binomial(n, a^j) and a' denotes a 
conditioned on the first k letters; i.e., a' = (di / a^j)^ =l . Now by the triangle inequality, 

k k k k 

2 n ■ E (A, a) = E |Aj — atin\ < E ^^(Aj — A') + E | A' — a' i m\ + |a'm — «jn| . (21) 

2=1 2=1 2=1 2=1 

The first quantity in (21) is at most 2y/2,ky/n, using Lemma 5.1 and the fact that E[)T) ■'L, A'] = 
E H = Ei=i din. The second quantity in (21) is at most ky/n using Theorem 1.1: 

k I - 

E y I A' — a/ml = E m ■ E || A( — c/lh < E mVk /E || A( — a !||| < k E y/rn < 

J 1 1 m m \ \' m 

i= 1 \ 

And the third quantity in (21) is at most y/n: 
k k 

E Ic/m — d\n I = E -Si- I m — cminl = E I m — d\un\ < stddev(m) < y/n. 

m ' 1 1 m ^^ a [fe] 1 L J i m I- J 

2=1 2=1 

Thus 2n • E d^v(A, a) < ((2\/2 + l)/c + l)y/n, and dividing by 2 n completes the proof. □ 

5.1 Proof of Lemma 5.1 

Our proof of Lemma 5.1 is essentially by reduction to the case when a is the uniform distribution 
and k = 1. We thus begin by analyzing the uniform distribution. 


5.1.1 The uniform distribution case 


In this subsection we will use the abbreviation (1/d) for the uniform distribution (1/d,... ,1/d) 
on [d]. Our goal is the following fact, which is of independent interest: 

Theorem 5.2. E Ai < n/d + 2y/n. 

A~SW n (l /d) 

We remark that Theorem 5.2 implies Lemma 5.1 (with a slightly better constant) in the case 
of a = (1/d,... ,1/d), since of course A ? < Ai for all i € [k]. Also, by taking d —>• oo we recover 
the well known fact that E Ai < 2 y/n when A has the Plancherel distribution. Indeed, our proof of 
Theorem 5.2 extends the original proof of this fact by Vershik and Kerov [VK85] (cf. the exposition 
in [Romll]). 


Proof. Consider the Schur-Weyl growth process under the uniform distribution (1/d,... , 1/d) 
on [d]. For m > 1 we define 

S rn = E[A^ m ^ — A|"' 1) ] = Pr[the mth box enters into the 1st row] = 171 A+ei( / 0 


E 


A~SW m_1 (l/d) s A (l/d) 


where we used (5). By Cauchy-Schwarz and identity ( 8 ), 


5i < 


E 


( ’ l+ ?ui d) Y = E dim(A)«>(l /<0-( ‘ X+ V, ( )i d> ) 

SW m_1 (l/d) \ S\(l/d) ) ^ v .Sxtl/dl 1 


Ahm—1 


V s a(V d) J 


y dim(A)s A+ei (l/d) • dim (A + ei)g A+ei (l/d) • ( 22 ) 

Ahm—1 ' Aw / / Ahm—1 ' ' 


< 


E 


A~sw m (i/d) V dm 


d + Aj 


dm 

d + 5\ + ... + 5n 
dm 
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where the ratio in (22) was computed using the first formula of (1) (and the homogeneity of Schur 
polynomials). Thus we have established the following recurrence: 


x < 

J m — 


\J dm 


yj d + <5i + • • • + 5 n 


(23) 


We will now show by induction that 5 m < ^ for all m > 1. Note that this will complete the 

proof, by summing over m € [n]. The base case, m = 1, is immediate since <5i = 1. For general 
m > 1, think of ..., <5 m _i as hxed and S m as variable. Now if 5 m satisfies (23), it is bounded 
above by the (positive) solution <5* of 

5 = y/r. + fi. where c = d + 6 ! + ••• + <5 m _ i. 

V dm 

Note that if 5 > 0 satisfies 

5 > Vc + 6 (24) 

V dm 

then it must be that 5 > 5* > S m . Thus it suffices to show that (24) holds for 5 = ^ . But 

indeed, 



V dm 


' d + <5i + • • • + 5 m -1 + — H— j= 

d \/m 




'fTl 

d H—-—|- 2y/m 




1 1 

d y/m ’ 


where the first inequality used induction. The proof is complete. 


□ 


5.1.2 Reduction to the uniform case 

Proof of Lemma 5.1. Given the sorted distribution a on [d]. let (5 be the sorted probability distri¬ 
bution on [d] defined, for an appropriate value of m, as 

P 1 — ® 1 ) • • ■ j Pk — Pk +1 — • • • — Pm — C^k+1 P Pm +1 0) Pm+2 — ■ • • — PcL — 0 . 

In other words, P agrees with a on the first k letters and is otherwise uniform, except for possibly a 
small “bump” at P m +i ■ By construction we have P y a. Thus it follows from our coupling result, 
Theorem 1.11, that 

k k 

E VA,< E Y' 

A~SW’*(q) f-' Ai~SW™(/3) 

1=1 1=1 

and hence it suffices to prove the lemma for P in place of a. Observe that P can be expressed as a 
mixture 

p = Pi ■ Vi +p 2 ■ V-2 +P3 ■ T>3, (25) 

of a certain distribution T>\ supported on [k], the uniform distribution T >2 on [m], and the uniform 
distribution P 3 on [m + 1]. We may therefore think of a draw /x ~ SW n {P) occurring as follows. 
First, [n] is partitioned into three subsets J 1 , 1 2 , /3 by including each i € [n] into Ij independently 
with probability pj. Next we draw strings w ^- 1 ~ uf j independently for j £ [3]. Finally, we let 
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w = (w/ 1 ', u>( 2 \ w^) £ [d\ n be the natural composite string and define /x = shRSK(ur). Let us 
also write fi^ = shRSK/ii/^) for j £ [3]. We now claim that 

k k k k 

£ftsEd 1) + £d 2) + £d 3) 

2 = 1 2=1 2=1 2=1 

always holds. Indeed, this follows from Greene’s Theorem: the left-hand side is |s|, where s £ [ d] n 
is a maximum-length disjoint union of A increasing subsequences in w, the projection of onto 
coordinates I j is a disjoint union of k increasing subsequences in and hence the right-hand 
side is at least |s^| + |s^ 2 ^| + |s®| = |s|. Thus to complete the proof of the lemma, it suffices to 
show 

k k k k 

E ^2 /*! 1} + E ^ 2) +E Y1 *4 3) < ain + k Vn. (26) 

2=1 2=1 2=1 2=1 

Since V\ is supported on [A], the first expectation above is equal to E[|i/jR)|] = pin. By (the remark 
just after) Theorem 5.2, we can bound the second expectation as 

k /- 

E^/xf } < AE/xf } < AE \w^ |/m + 2A E y |ru( 2 ) | < k(p 2 n)/m + 2 kyfp^ri. 

2=1 

Similarly the third expectation in (26) is bounded by k{jp$ri )/(m +1) + Ik^Jp^n. Using y/p 2 + y/pi < 
y/2, we have upper-bounded the left-hand side of (26) by 

(Pi + P2^ + Ps^rh + 2\/2 ky/n = A j n + 2a/2 ky/n, 


as required. □ 

6 Principal component analysis 

In this section we analyze a straightforward modification to Keyl’s tomography algorithm that 
allows us to perform principal component analysis on an unknown density matrix p £ (C dxd . The 
PCA algorithm is the same as Keyl’s algorithm, except that having measured A and U, it outputs 
the rank-A matrix C/diag^(A)tA rather than the potentially full-rank matrix L/diag(A)tA. Here 
we recall the notation diag^(A) for the d x d matrix diag(A 1; ..., X k , 0,... , 0). 

Before giving the proof of Theorem 1.5, let us show why the case of Frobenius-norm PCA 
appears to be less interesting than the case of trace-distance PCA. The goal for Frobenius PCA 
would be to output a rank-A matrix p satisfying 

\\P ~ p\\f < \J a | +1 + ■■■ + a d + e > 

with high probability, while trying to minimize the number of copies n as a function of A, d, 
and e. However, even when p is guaranteed to be of rank 1, it is likely that any algorithm will 
require n = P(d/e 2 ) copies to output an e-accurate rank-1 approximator p. This is because such an 
approximator will satisfy ||p — p||i < y/2 ■ ||p — p\\p = O(e), and it is likely that n = P(d/e 2 ) copies 
of p are required for such a guarantee (see, for example, the lower bounds of [HHJ + 15], which show 
that n = H( £ 2 i 0 g^/ e ) ) copies are necessary for tomography of rank-1 states.). Thus, even in the 
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simplest case of rank-1 PCA of rank-1 states, we probably cannot improve on the n = 0(d/e 2 ) 
copy complexity for full tomography given by Corollary 1.4. 

Now we prove Theorem 1.5. We note that the proof shares many of its steps with the proof of 
Theorem 1.2. 

Proof of Theorem 1.5. Throughout the proof we assume A ~ SW"(a) and U ~ K A (p). We write 
R for the lower-right (d — k) x (d — k) submatrix of pU and we write T = pU — R. Then 

E ||?7diag«(A)[/t - p \\ 1= e ||diag^(A) - tfpUlh < E ||diag^(A) - r||i + E ||i2||i. (27) 

A,u A,u A,u A,u 

We can upper-bound the first term in (27) using 


• (28) 

The first inequality is Cauchy-Schwarz together with the fact that rank(diag^(A) — T) < 2k 
(since the matrix is nonzero only in its first k rows and columns). The second inequality uses that 
diag(A) — U'pU is formed from diag^(A) — T by adding a matrix, diag(A) — diag^ (A) — R, of 
disjoint support; this can only increase the squared Frobenius norm (sum of squares of entries). 
Finally, the third inequality uses Theorem 1.2. To analyze the second term in (27), we note that 
R is a principal submatrix of pU, and so it is positive semidefinite. As a result, 

E II_R||! = E tr (R) = 1- E tr(r). (29) 

A ,U A ,U \,U 


a E ||diagW(A) 


-r 


1 S V ZK 


A ,U 


By Corollary 4.7, 

e tr(r) = eVe (u* P u)ii> eV $A+e ;^ =eV - s ' A +^( a ) — 

\u A ^u K 4> A (a) A s A (a) s A+e< (l,..., 1) 


i =1 


■SA+ei (<a) 


>eV 

x ~l s A(a) 


2 - 


s \+ej (1) •••)!) 

s A (l,■••,!) 


2EV SA+e ^ a ^ _ eV 

x ~l s a(«) x ~{ s a(«) s A (1,- ••,!)’ 


SA +ei (a) s A+ei (l,...,l) 


(30) 


where we used r > 2 — ^ for r > 0. The first term here is lower-bounded using Proposition 2.1: 


2E y-i>±?pO> 2 y' 

X ' .Qx (r\ \ — 


x s A(aJ 


(31) 


2—1 
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As for the second term in (30), we use ( 8 ) and the first formula in ( 1 ) to compute 


y- g A+ej(Q) SA+ei(l; • • • >1) _ y^ , x _ s\ +ei {a) dim(A + ei)(d + A, — i + 1) 

\ 2-~i ii — z_/ f in A t s ^ a i 


s a(«> fcIAh „ 


EE dim (A + ei)s\ +ei (a) 


si(a) dim(A)(n + l) 

(d — i + Aj + 1 ) 


i= 1 Ah n 
k 


n + 1 


(d — i + A') 


< V E 

^a'~sw" +1 (q) n + 1 
1=1 v ' 


(by ( 8 ) again) 


< 


1 


E 


n + 1 A'~SW" + 1 (o) “ 

v ' 1=1 

A 2y/2 k kd 

<J2 a i + ^^ + 


5Z A * + 


kd 


n 


i =1 


n n 


(32) 


where the last step is by Lemma 5.1. Combining (27)—(32) we get 

I; ||MiagW(A)U , - P || 1 <(l-^ ai ')+d“ + ^ + -< E «< + 

A,t/ \ ■ ' / V n x/n n 

\ i= i / v i=fc+i 


32fcd kd 

- 1 -) 

n n 


where the second inequality used k < \fkd. Finally, as the expectation is also trivially upper- 
bounded by 2, we may use 6 y/r > min(2, y/32r + r) (which holds for all r > 0) to conclude 


E ||t/diag( fc )(A)t/t-p|| 1 < 

A,u 


d 


E 

i=k -\-1 


<+ + 6 



□ 


7 Majorization for the RSK algorithm 

In this section we prove Theorem 1.11. The key to the proof will be the following strengthened 
version of the d = 2 case, which we believe is of independent interest. 

Theorem 7.1. Let 0 < p, q < 1 satisfy \q — \\>\p — ^|; in other words, the q-biased probability 
distribution (q, 1 — q) on {1, 2} is “more extreme” than the p-biased distribution (p, 1 — p). Then 
for any n € N there is a coupling (w,x) of the p-biased distribution on {l,2} n and the q-biased 
distribution on {l,2} n such that for all 1 < i < j < n we have LIS (x[i .._)]) > LIS(w[i.. j]) always. 

We now show how to prove Theorem 1.11 given Theorem 7.1. Then in the following subsections 
we will prove Theorem 7.1. 

Proof of Theorem 1.11 given Theorem 7.1. A classic result of Muirhead [Mui02] (see also [MOA11, 
B.l Lemma]) says that /3 + a implies there is a sequence /3 = 70 + 71 + • • • + 7 t = a such 7 j 
and 7 j+i differ in at most 2 coordinates. Since the > relation is transitive, by composing couplings 
it suffices to assume that a and (3 themselves differ in at most two coordinates. Since the Schur- 
Weyl distribution is symmetric with respect to permutations of [d], we may assume that these two 
coordinates are 1 and 2. Thus we may assume a = (oq, 0 : 2 , /%, 04 ,..., /3d), where aq + = /?i + /?2 

and ai,a 2 are between f3 q, /?2 - 
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We now define the coupling (A, p) as follows: We first choose a string z E ({*} U {3,4,... , d}) n 
according to the product distribution in which symbol j has probability /3j for j > 3 and symbol * 
has the remaining probability /3\ + /? 2 - Let n* denote the number of *’s in z. Next, we use 
Theorem 7.1 to choose coupled strings (w,x) with the p-biased distribution on {1,2}”* and the 
g-biased distribution on {1,2}”* (respectively), where p = g^g 2 and q = af+g 2 ■ Note indeed that 
\q — \ | > | p— ||, and hence LIS(cc[z.. j]) > LIS(tu[z ..j]) for all 1 < i < n*. Now let “z U w" 
denote the string in [d] n obtained by filling in the *’s in z with the symbols from w, in the natural 
left-to-right order; similarly define “zUaj”. Note that zL)w is distributed according to the product 
distribution a®” and likewise for z U x and /3® n . Our final coupling is now obtained by taking 
A = shRSK(z U w) and p = shRSK(z U x). We need to show that p > A always. 

By Greene’s Theorem, it suffices to show that if si,..., Sk are disjoint increasing subsequences 
in z U w of total length S , we can find k disjoint increasing subsequences s{,..., s' k in z U x of 
total length at least S. We first dispose of some simple cases. If none of si,..., Sfc contains any l’s 
or 2’s, then we may take s[ = s* for i E [A;], since these subsequences all still appear in z U x. The 
case when exactly one of s\,... ,Sk contains any l’s or 2’s is also easy. Without loss of generality, 
say that is the only subsequence containing l’s and 2’s. We may partition it as (t,u), where t 
is a subsequence of w and u is a subsequence of the non-*’s in z that follow w. Now let i! be the 
longest increasing subsequence in x. As t is an increasing subsequence of w, we know that t' is at 
least as long as t. Further, (t',u) is an increasing subsequence in z U x. Thus we may take s[ = Si 
for i < k, and s' k = ( t',u ). 

We now come to the main case, when at least two of si,..., Sk contain l’s and/or 2’s. Let’s first 
look at the position j E [n] of the rightmost 1 or 2 among s±,... ,Sk- Without loss of generality, 
assume it occurs in Sk- Next, look at the position i E [n] of the rightmost 1 or 2 among si,..., Sk- 1 - 
Without loss of generality, assume it occurs in Sk- 1 - We will now modify the subsequences si,..., 
as follows: 

• all l’s and 2’s are deleted from si,..., Sk -2 (note that these all occur prior to position z); 

• Sk -1 is changed to consist of all the 2’s within (z U iu)[l.. i ]; 

• the portion of Sk to the right of position i is unchanged, but the preceding portion is changed 
to consist of all the l’s within (z U iu)[l.. i\. 

It is easy to see that the new s\,... ,Sk remain disjoint subsequences of z U w, with total length 
at least S. We may also assume that the portion of Sk between positions i + 1 and j consists of a 
longest increasing subsequence of w. 

Since the subsequences s\,...,Sk -2 don’t contain any l’s or 2’s, they still appear in z U x, 
and we may take these as our s{,..., s' k _ 2 . We will also define s' k _ 1 to consist of all 2’s within 
(z U x)[l.. i]. Finally, we will define s',, to consist of all l’s within (zllz)[l..i], followed by the 
longest increasing subsequence of x occurring within positions (i + 1) ..j in zUa:, followed by the 
portion of Sk to the right of position j (which does not contain any l’s or 2’s and hence is still in 
zUi). It is clear that s},..., s' k are indeed disjoint increasing subsequences of z U x. Their total 
length is the sum of four quantities: 

• the total length of si,..., Sk- 2 ; 

• the total number of l’s and 2’s within (z U x)\l.. i ]; 

• the length of the longest increasing subsequence of x occurring within positions (z + 1) ..j in 
z Us; 
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• the length of the portion of to the right of position j. 

By the coupling property of (w,x), the third quantity above is at least the length of the longest 
increasing subsequence of w occurring within positions (* +1).. j inzllw. But this precisely shows 
that the total length of s\ ,..., s' k is at least that of sj,..., as desired. □ 

7.1 Substring-LIS-dominance: RSK and Dyck paths 

In this subsection we make some preparatory definitions and observations toward proving Theo¬ 
rem 7.1. We begin by codifying the key property therein. 

Definition 7.2. Let w,w' € A n be strings of equal length. We say w' substring-LIS-dominates w, 
notated w' w, if LIS(u/[z.. j]) > LIS(io[*.. j\) for all 1 < i < j < n. (Thus the coupling in 
Theorem 7.1 satisfies w CS> v always.) The relation t#> is reflexive and transitive. If we have the 
substring-LIS-dominance condition just for i = 1 we say that w' prefix-LIS-dominates w. If we 
have it just for j = n we say that w' suffix-LIS-dominates w. 

Definition 7.3. For a string w € A n we write behead(u;) for w[2 ,.n\ and curtail(tc) for w[ 1.. n— 1]. 

Remark 7.4. We may equivalently define substring-LIS-dominance recursively, as follows. If w' 
and w have length 0 then w' E#> w. If w' and w have length n > 0, then w' C§> w if and only if 
LIS(ic') > LlS(tc) and behead(u/) Cg> behead(tc) and curtail(u/) C§> curtail(w). By omitting the 
second/third condition we get a recursive definition of prefix/suffix-LIS-dominance. 

Definition 7.5. Let Q be a (nonempty) standard Young tableau. We define curtail(Q) to be the 
standard Young tableau obtained by deleting the box with maximum label from Q. 

The following fact is immediate from the definition of the RSK correspondence: 

Proposition 7.6. Letw G A' 1 be a nonempty string. Suppose RSK(w) = (P.Q) and RSK (curtail (w)) 
(P', Q'). Then Q' = curtail(Q). 

The analogous fact for beheading is more complicated. 

Definition 7.7. Let Q be a (nonempty) standard Young tableau. We define behead(Q) to be the 
standard Young tableau obtained by deleting the top-left box of Q , sliding the hole outside of the 
tableau according to jeu de taquin (see, e.g., [Ful97, SagOl]), and then decreasing all entries by 1. 
(The more traditional notation for behead(Q) is A (Q).) 

The following fact is due to [Sch63]; see [SagOl, Proposition 3.9.3] for an explicit proof. 2 * * 

Proposition 7.8. Let w € A n be a nonempty strinq. Suppose RSK(u>) = ( P , Q) and RSK(behead(u;)) 
(P', Q'). Then Q' = behead(Q). 

Proposition 7.9. Let w, w' G A n be strings of equal length and write RSK(rr) = (P, Q), RSK(u/) = 
(P 7 , Q'). Then whether or not w' t^>w can be determined just from the recording tableaus Q' and Q. 

Proof. This follows from the recursive definition of given in Remark 7.4: whether LIS(u/) > 
LIS(u;) can be determined by checking whether the first row of Q' is at least as long as the first 
row of Q; the recursive checks can then be performed with the aid of Propositions 7.6, 7.8. □ 

2 Technically, therein it is proved only for strings with distinct letters. One can recover the result for general strings 

in the standard manner; if the letters Wi and Wj are equal we break the tie by using the order relation on i,j. See 

also [vL13, Lemma], 
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Definition 7.10. In light of Proposition 7.9 we may define the relation on standard Young 
tableaus. 

Remark 7.11. The simplicity of Proposition 7.6 implies that it is very easy to tell, given w, w' € A n 
with recording tableaus Q and Q' , whether w' sufhx-LIS-dominates w. One only needs to check 
whether Q\- < Q\ 3 for ah j > 1 (treating empty entries as oo). On the other hand, it is not 
particularly easy to tell from Q' and Q whether w' prefix-LIS-dominates w, one seems to need to 
execute all of the jeu de taquin slides. 

We henceforth focus attention on alphabets of size 2. Under RSK, these yield standard Young 
tableaus with at most 2-rows. (For brevity, we henceforth call these 2 -row Young tableaus , even 
when they have fewer than 2 rows.) In turn, 2-row Young tableaus can be identified with Dyck 
paths (also known as ballot sequences). 

Definition 7.12. We define a Dyck path of length n to be a path in the xy-plane that starts from 
(0,0), takes n steps of the form (-t-1,+1) (an upstep ) or (+1,-1) (a downstep ), and never passes 
below the x-axis. We say that the height of a step s, written ht(s), is the y-coordinate of its 
endpoint; the (final) height of a Dyck path W, written ht(IF), is the height of its last step. We do 
not require the final height of a path to be 0; if it is we call the path complete , and otherwise we 
call it incomplete. A return refers to a point where the path returns to the x-axis; i.e., to the end 
of a step of height 0. An arch refers to a minimal complete subpath of a Dyck path; i.e., a subpath 
between two consecutive returns (or between the origin and the first return). 

Definition 7.13. We identify each 2-row standard Young tableau Q of size n with a Dyck path W 
of length n. The identification is the standard one: reading off the entries of Q from 1 to n, we add 
an upstep to W when the entry is in the first row and a downstep when it is in the second row. 
The fact that this produces a Dyck path (i.e., the path does not pass below the x-axis) follows from 
the standard Young tableau property. Note that the final height of W is the difference in length 
between Q 1 s two rows. We also naturally extend the terminology “return” to 2-row standard Young 
tableaus Q: a return is a second-row box labeled 2 j such that boxes in Q labeled 1 , ... ,2j form a 
rectangular 2 X j standard Young tableau. 

Definition 7.14. In light of Definition 7.10 and the above identification, we may define the relation 
Dg> on Dyck paths. 

Of course, we want to see how beheading and curtailment apply to Dyck paths. The following 
fact is immediate: 

Proposition 7.15. If W is the Dyck path corresponding to a nonempty 2-row standard Young 
tableau Q, then the Dyck path W' corresponding to curtail(Q) is formed from W by deleting its last 
segment. We write W' = cur tail (IF) for this new path. 

Again, the case of beheading is more complicated. We first make some definitions. 

Definition 7.16. Raising refers to converting a downstep in a Dyck path to an upstep; note that 
this increases the Dyck path’s height by 2. Conversely, lowering refers to converting an upstep to a 
downstep. Generally, we only allow lowering when the result is still a Dyck path; i.e., never passes 
below the x-axis. 

Proposition 7.17. Let Q be a nonempty 2-row standard Young tableau, with corresponding Dyck 
path W. Let W' be the Dyck path corresponding to behead(Q). Then W' is formed from W as 
follows: First, the initial step of W is deleted (and the origin is shifted to the new initial point). 
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If W had no returns then the operation is complete and W' is the resulting Dyck path. Otherwise, 
if W had at least one return, then in the new path W' that step (which currently goes below the 
x-axis) is raised. In either case, we write W' = behead(VF) for the resulting path. 

Proof. We use Definitions 7.7 and 7.13. Deleting the top-left box of Q corresponds to deleting 
the first step of W, and decreasing all entries in Q by 1 corresponds to shifting the origin in W. 
Consider now the jeu de taquin slide in Q. The empty box stays in the first row until it first reaches 
a position j such that Qi t j+i > Q 2 ,j — if such a position exists. Such a position does exist if and 
only if Q contains a return (with box (2, j) being the first such return). If Q (equivalently, W) 
has no return then the empty box slides out of the first row of Q, and indeed this corresponds 
to making no further changes to W. If Q has its first return at box (2, j), this means the jeu de 
taquin will slide up the box labeled 2 j (corresponding to raising the first return step in W); then 
all remaining slides will be in the bottom row of Q, corresponding to no further changes to W. □ 

Remark 7.18. Similar to Remark 7.11, it is easily to “visually” check the suffix-LIS-domination 
relation for Dyck paths: W' suffix-LIS-dominates W if and only if W' is at least as high as W 
throughout the length of both paths. On the other hand, checking the full substring-LIS-domination 
relation is more involved; we have W'\^>W if and only if for any number of simultaneous beheadings 
to W' and W, the former path always stays at least as high as the latter. 

Finally, we will require the following definition: 

Definition 7.19. A hinged range is a sequence (Rq, si,R\, S 2 , R 2 , ■ ■ ■ ,Sk, Rk ) (with k > 0), where 
each Si is a step (upstep or downstep) called a hinge and each Ri is a Dyck path (possibly of 
length 0) called a range. The “internal ranges” R\,... ,Rk -1 are required to be complete Dyck 
paths; the “external ranges” Ro and Rk may be incomplete. 

We may identify the hinged range with the path formed by concatenating its components; note 
that this need not be a Dyck path, as it may pass below the origin. 

If H is a hinged range and H' is formed by raising zero or more of its hinges (i.e., converting 
downstep hinges to upsteps), we say that H' is a raising of H or, equivalently, that H is a lowering 
of H'. We call a hinged range fully lowered (respectively, fully raised) if all its hinges are downsteps 
(respectively, upsteps). 


7.2 A bijection on Dyck paths 


Theorem 7.20. Fix integers n > 2 and 1 < A 2 < |_§ J- Define 

W = {( W , s \) : W is a length-n Dyck path with exactly A 2 downsteps; 
si is a downstep in W} 


and 


A2 

W ={J{(W',sf) : w' is a length-n Dyck path with exactly A 2 — k downsteps; 
fc=1 Si is an upstep in W' with k + 1 < ht(si) < ht(W') — k + 1; 

Si is the rightmost upstep in W' of its height}. 


Then there is an explicit bijection f : W —» W' such that whenever f(W,s 1 ) = (W^sj) it holds 

that w'm>w. 


Remark 7.21. Each length-n Dyck path with exactly A 2 downsteps occurs exactly A 2 times in W. 
Each length-n Dyck path with strictly fewer than A 2 downsteps occurs exactly n — 2 A 2 + 1 times 
in W. 
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Proof of Theorem 7.20. Given any (W,s i) € W, we define /’s value on it as follows. Let s 2 be 
the first downstep following si in W having height ht(si) — 1; let S 3 be the first downstep fol¬ 
lowing S 2 in W following S 2 having height ht(s 2 ) — 1; etc., until reaching downstep Sk having 
no subsequent downstep of smaller height. Now decompose W as a (fully lowered) hinged range 
H = (Rq, si, Ri, ..., Sk, Rk)- Let H' = ( R ' 0 , s), R,\,..., s' k , R' k ) be the fully raised version of H 
(where each R) is just Rj and each s'- is an upstep). Then f(W, Sk ) is defined to be (W', s(). where 
W' is the Dyck path corresponding to H'. 

First we check that indeed fW', s \) € W. As W' is formed from W by k raisings, it has exactly 
A 2 — k downsteps. Since ht(sfc) > 0 it follows that ht(si) > k — 1 and hence lit(s / 1 ) > k + 1. On the 
other hand, ht^s)) + (k — 1) = ht(s' fc ) < ht(kF') and so ht). 1 ;',) < ht(kF') — k + 1. Finally, s' x is the 
rightmost upstep in W' of its height because H' is fully raised. 

To show that / is a bijection, we will define the function g : W' —>• W that will evidently be /’s 
inverse. Given any (W *, s)) € W, with W' having exactly A 2 — k downsteps, we define g 7 s value on 
it as follows. Let s' 2 be the last (rightmost) upstep following in W' having height ht)^) + 1; let 
S 3 be the last upstep following s ' 2 in W' having height ht(s' 2 ) +1; etc., until s' k is defined. That this 
s' k indeed exists follows from the fact that ht(s',) < ht (W) — k + 1. Now decompose W' as a (fully 
raised) hinged range H' = (R' 0 , s\ . R),..., s' k , R' k ). The fact that R' k is a Dyck path (i.e., does not 
pass below its starting height) again follows from the fact that ht(s' fc ) = ht)^) + k — 1 < ht (W'). 
Finally, let H = (Ro, s\, R \,..., Sk, Rk) be the fully lowered version of H', and W the corresponding 
path. As W has exactly A 2 downsteps, we may define g(W' ,s' x ) = (W, si) provided W is indeed a 
Dyck path. But this is the case, because the lowest point of W occurs at the endpoint of Sk, and 
ht(sfc) = ht(si) — k + 1 = ht(.‘) ,/ 1 ) — 2 — k + 1 = ^(s^) — k — 1 > 0 since ht(s^) > k + 1 . 

It is fairly evident that / and g are inverses. The essential thing to check is that the sequence 
si,... ,Sk determined from s\ when computing f(W,s\) is “the same” (up to raising/lowering) as 
the sequence .s),..., s' k , determined from s\ in computing g(W ', s)), and vice versa. The fact that 
the sequences have the same length follows, in the g o f = id case, from the fact that ht {W') = 
ht(kF) + 2k ; it follows, in the f o g = id case, from the fact that R' k is a Dyck path. The fact 
that the hinges have the same identity is evident from the nature of fully raising/lowering hinged 
ranges. 

It remains to show that if f(W,s\) = (W',8^) then W' CS§> W. Referring to Remark 7.18, we 
need to show that if W' and W are both simultaneously beheaded some number of times b, then in 
the resulting paths, W' is at least as high as W throughout their lengths. In turn, this is implied 
by the following more general statement: 

Claim 7.22. After b beheadings, W' and W may be expressed as hinged ranges H' = (Rq, s^, R \,..., s' k , Rk) 
and H = (Ro, s\, R \,..., Sk, Rk) (respectively) such that H' is the fully raised version of H (i.e., 
each s) is an upstep). 

(Note that we do not necessarily claim that H is the fully lowered version of H '.) 

The claim can be proved by induction on b. The base case b = 0 follows by definition of /. 
Throughout the induction we may assume that the common initial Dyck path Rq is nonempty, as 
otherwise si must be an upstep, in which case we can redefine the common initial Dyck path of W 
and W' to be (si,R\) = (s^Ri). 

We now show the inductive step. Assume W' and W are nonempty paths as in the claim’s 
statement, with Rq nonempty. Suppose now that W' and W are simultaneously beheaded. The 
first step of W' and W (an upstep belonging to Rq) is thus deleted, and the origin shifted. If Rq 
contained a downstep to height 0 then the first such downstep is raised in both behead(VF') and 
behead(VF) and the inductive claim is maintained. Otherwise, suppose Rq contained no downsteps 
to height 0. It follows immediately that W' originally had no returns to height 0 at all; hence the 


21 


beheading of W' is completed by the deletion of its first step. It may also be that W had no returns 
to height 0 at all; then the beheading of W is also completed by the deletion of its first step and the 
induction hypothesis is clearly maintained. On the other hand, W may have had some downsteps 
to 0 within (si, R \,..., Sk, Rk)- In this case, the first (leftmost) such downstep must occur at one of 
the hinges Sj, and the beheading of W is completed by raising this hinge. The inductive hypothesis 
is therefore again maintained. This completes the induction. □ 

We derive an immediate corollary, after introducing a bit of notation: 

Definition 7.23. We write SYT n (=A 2 ) (respectively, SYT n (<A 2 )) for the set of 2-row standard 
Young tableaus of size n with exactly (respectively, at most) A 2 boxes in the second row. 

Corollary 7.24. For any integers n > 2 and 0 < A 2 < [_§J> there is a coupling (Q,Q') of the 
uniform distribution on SYT n (=A 2 ) and the uniform distribution on SYT n (<A 2 — 1 ) such that 
Q' Q always. 

Proof. Let (VY,si) be drawn uniformly at random from the set W defined in Theorem 7.20, and 
let (W 7 , s' x ) = f(W, si). Let Q G SYT n (=A 2 ), Q' G SYT n (<A 2 — 1) be the 2 -row standard Young 
tableaus identified with W, W (respectively). Then Theorem 7.20 tells us that Q' Rg> Q always, 
and Remark 7.21 tells us that Q and Q' are each uniformly distributed. □ 

Corollary 7.25. For any integers n > 0 and 0 < A 2 < A 2 < |_?J, there is a coupling (QQ') 
of the uniform distribution on SYT n (<A 2 ) and the uniform distribution on SYT n (<A 2 ) such that 
Q' CS§> Q always. 

Proof. The cases n < 2 and A 2 = A 2 are trivial, so we may assume n > 2 and 0 < A 2 < A 2 < |_§J • 
By composing couplings and using transitivity of CS>, it suffices to treat the case A 2 = A 2 — 1. But 
the uniform distribution on SYT n (<A 2 ) is a mixture of (a) the uniform distribution on SYT n (=A 2 ), 
(b) the uniform distribution on SYT n (<A 2 — 1 ); and these can be coupled to SYT n (<A 2 — 1 ) under 
the C^> relation using (a) Corollary 7.24, (b) the identity coupling. □ 

Before giving the next corollary, we have a definition. 

Definition 7.26. Let A be any 2 -letter alphabet. We write A r l for the set of length-n strings 
over A with exactly k copies of the larger letter, and we write Af n _ k = A k U A ™_ fc . 

Corollary 7.27. For A a 2-letter alphabet and integers 0 < k! < k < |_^J, there is a coupling 
(w,w') of the uniform distribution on A kn _ k and the uniform distribution on A k , n _ k , such that 
w' c§> w always. 

Proof. We first recall that if x ~ A k is uniformly random and (P , Q) = RSK(»), then the recording 
tableau Q is uniformly random on SYT n {<k). This is because for each possible recording tableau 
Q G SYT n (<&;) there is a unique insertion tableau P of the same shape as Q having exactly k boxes 
labeled with the larger letter of A. (Specifically, if P h (Ai, A 2 ), then the last k — X 2 boxes of P’s 
first row, and all of the boxes of P’s second row, are labeled with Ws larger letter.) It follows that 
the same is true if x ~ A k n _ k is uniformly random. But now the desired coupling follows from 
Corollary 7.25 (recalling Definition 7.10). □ 

In fact, Corollary 7.27 is fundamentally stronger than our desired Theorem 7.1, as we now show: 
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Proof of Theorem 7.1. For r E [0,1], suppose we draw an r-biased string y E {l,2} n and define 
the random variable j such that y E {l,2}^ n •. (Note that given j . the string y is uniformly 
distributed on {1, 2}j n -.) Write L r (£) for the cumulative distribution function of j: i.e., L r {£) = 
Pr [y E (Jj<e{l, 2}™ n _j], where y is ?’-biased. 

Claim: L q (£) > L p (£) for all 0 < £ < |_§J. 

Before proving the claim, let us show how it is used to complete the proof of Theorem 7.1. We 
define the required coupling (w, x) of p-biased and g-biased distributions as follows: First we choose 
6 E [0,1] uniformly at random. Next we define k (respectively, k') to be the least integer such that 
L p (k) > 6 (respectively, L q (k') > 0); from the claim it follows that k' < k always. Finally, we 
let (w,x) be drawn from the coupling on {l,2}^ n _ fe and {l,2} k i n _ k / specified in Corollary 7.27. 
Then as required, we have that x' CS$> w always, and that w has the p-biased distribution and x 
has the g-biased distribution. 

It therefore remains to prove the claim. We may exclude the trivial cases £ = j or q E {0,1}, 
where L q {£) = 1. Also, since L r {£ ) = L\- r (£) by symmetry, we may assume 0 < q < p < Thus it 
suffices to show that 4-L r (£) < 0 for 0 < r < \. Letting h denote the “Hamming weight” (number 
of 2 ’s) in an r-biased random string on { 1 , 2 } n , we have 


L r {£) = Pr[fc < £] + Pr[h, > n - £} = 1 - Pr [h > £] + Pr[Ii > n - £ - 1] 

^-L r (£) = —-f- Pr[Yi > £} + ~r- Pr[h- > n — 1 — £]. 
dr dr dr 

(The first equality used £ < ^.) But it is a basic fact that Pr[/i > t] = n( n “ 1 )r t (l — 
Thus 

^ L r {£) = n (^~ l ^j (-/(1 - r) n - x - 1 + r n - l ~ E {l - r/) , 
and we may verify this is indeed nonpositive: 

-/(l _ + r n ~ 1 ~^(l - r ) e <0 1< (^) n_1 ” 2£ , 

which is true since 0 < r < \ and n — 1 — 2£ > 0 (using £ < ^ again). 


r) 


n—l—t 
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